Chinese Word Segmentation based on Mixing Model
نویسندگان
چکیده
This paper presents our recent work for participation in the Second International Chinese Word Segmentation Bakeoff. According to difficulties, we divide word segmentation into several sub-tasks, which are solved by mixed language models, so as to take advantage of each approach in addressing special problems. The experiment indicated that this system achieved 96.7% and 97.2% in F-measure in PKU and MSR open test respectively.
منابع مشابه
A Pragmatic Chinese Word Segmentation Approach Based on Mixing Models
A pragmatic Chinese word segmentation approach is presented in this paper based on mixing language models. Chinese word segmentation is composed of several hard sub-tasks, which usually encounter different difficulties. The authors apply the corresponding language model to solve each special sub-task, so as to take advantage of each model. First, a class-based trigram is adopted in basic word s...
متن کاملChinese Word Segmentation based on Mixing Multiple Preprocessor and CRF
This paper describes the Chinese Word Segmenter for our participation in CIPSSIGHAN-2010 bake-off task of Chinese word segmentation. We formalize the tasks as sequence tagging problems, and implemented them using conditional random fields (CRFs) model. The system contains two modules: multiple preprocessor and basic segmenter. The basic segmenter is designed as a problem of character-based tagg...
متن کاملChinese Word Segmentation Based On Direct Maximum Entropy Model
Chinese word segmentation is a fundamental and important issue in Chinese information processing. In order to find a unified approach for Chinese word segmentation, the author develop a Chinese lexical analyzer PCWS using direct maximum entropy model. The paper presents the general description of PCWS, as well as the result and analysis of its performance at the Second International Chinese Wor...
متن کاملCombining Character-Based and Subsequence-Based Tagging for Chinese Word Segmentation
Chinese word segmentation is the initial step for Chinese information processing. The performance of Chinese word segmentation has been greatly improved by character-based approaches in recent years. This approach treats Chinese word segmentation as a character-wordposition-tagging problem. With the help of powerful sequence tagging model, character-based method quickly rose as a mainstream tec...
متن کاملReport to BMM-based Chinese Word Segmentor with Context-based Unknown Word Identifier for the Second International Chinese Word Segmentation Bakeoff
This paper describes a Chinese word segmentor (CWS) based on backward maximum matching (BMM) technique for the 2 nd Chinese Word Segmentation Bakeoff in the Microsoft Research (MSR) closed testing track. Our CWS comprises of a context-based Chinese unknown word identifier (UWI). All the context-based knowledge for the UWI is fully automatically generated by the MSR training corpus. According to...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2005